Domain-independent Model for Chemical Compound and Drug Name Recognition

نویسندگان

  • Utpal Kumar Sikdar
  • Asif Ekbal
  • Sriparna Saha
چکیده

This paper briefly describes the works that we have carried out as part of our participation in the BioCreative-IV Track-2 shared task on chemical compound and drug name recognition. We submit five runs, all of which are based on the machine learning approaches. As the machine learning techniques we use Conditional Random Field (CRF), Support Vector Machine (SVM) and a simple ensemble technique. Our system is domain-independent in the sense that it does not make use of any domain-specific external resources and/or tools. Here we report the evaluation results for only of those runs where development set is not included as part of the training procedure. We obtain the best performance with a CRF based model that shows the micro average recall, precision and F-score values of 72.80%, 75.82% and 74.28%, respectively. The same model yields the macro average recall, precision and F-core values of 73.96%, 74.22% and 72.47%, respectively.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CHEMDNER system with mixed conditional random fields and multi-scale word clustering

BACKGROUND The chemical compound and drug name recognition plays an important role in chemical text mining, and it is the basis for automatic relation extraction and event identification in chemical information processing. So a high-performance named entity recognition system for chemical compound and drug names is necessary. METHODS We developed a CHEMDNER system based on mixed conditional r...

متن کامل

DBCHEM: A Database Query Based Solution for the Chemical Compound and Drug Name Recognition Task

We propose a method, named DBCHEM, based on database queries for the chemical compound and drug name recognition task of the BioCreative IV challenge. We prepared a database with 145 million entries containing compound and drug names, their synonyms, and molecular formulas. PubChem Power User Gateway (PUG) system is used to construct the database. Candidate chemical and drug names are identifie...

متن کامل

Chemistry-specific Features and Heuristics for Developing a CRF-based Chemical Named Entity Recogniser

We describe and compare methods developed for the BioCreative IV chemical compound and drug name recognition (CHEMDNER) task. The presented conditional random fields (CRF)-based named entity recogniser employs a statistical model trained on domain-specific features, in addition to those typically used in biomedical NERs. In order to increase recall, two heuristics-based post-processing steps we...

متن کامل

Effects of carbon nanotubes on properties of the fluorouracil anticancer drug: DFT studies of a CNT-fluorouracil compound

Density functional theory (DFT) calculations were performed to investigate the effects of a carbon nanotube (CNT) on the properties of the fluorouracil (F-Uracil) anticancer drug. To achieve the purpose, a molecular model including both of F-Uracil and CNT molecules was created to represent the CNT@F-Uracil compound. The optimized parameters indicated that the new compound could show new proper...

متن کامل

A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features

Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013